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Studies of collective human behavior in 
the social sciences, often grounded in de- 
tails of actions by individuals, have much 
to offer 'social' models from the physi- 
cal sciences concerning elegant statistical 
regularities. Drawing on behavioral stud- 
ies of social influence, we present a par- 
simonious, stochastic model, which gener- 
ates an entire family of real-world right- 
skew socio-economic distributions, includ- 
ing exponential, winner-take-all, power 
law tails of varying exponents and power 
laws across the whole data. The widely 
used Albert-Barabasi model of preferen- 
tial attachment is simply a special case 
of this much more general model. In ad- 
dition, the model produces the continu- 
ous turnover observed empirically within 
those distributions. Previous preferential 
attachment models have generated spe- 
ciflc distributions with turnover using ar- 
bitrary add-on rules, but turnover is an 
inherent feature of our model. The model 
also replicates an intriguing new relation- 
ship, observed across a range of empirical 



studies, between the power law exponent 
and the proportion of data represented. 

Since Pareto, the right-skew nature of income 
distribution has been known, while similar skew- 
ness in the frequencies of words, scientific papers, 
and city sizes have been recognised for decades 
[TJ [21 131 m [S]. In the statistical sciences, par- 
ticularly statistical physics, a recent explosion of 
interest in such distributions for social phenom- 
ena includes internet links [6l [7] , author citations 
[8J, sexual partners [3], and firm sizes and their 
extinctions ^TU[ [TT| amongst many others. 

With socio-economic phenomena, the detailed 
debate over the exact form of these distributions 
- for example, power laws versus similar fat- 
tailed functions such as the stretched exponential 
[U [21 [13] - often involves the characterisation of 
the distribution at a point in time, and often ne- 
glects the importance of dynamics and the un- 
derlying behaviour [121 HI] which gives rise to 
changes over time within any given distribution. 

Simon [3] argued that right-skew distributions 
were so widespread that their key similarity was 
likely to be 'in the underlying probability mech- 
anisms' that led to their generation. This is 



clearly the case but, as noted in the social sci- 
ences for over a century |12j . it is inherently a 
description of macro phenomena, without an ex- 
planation for the individual behaviour that gives 
rise to emergent properties. Also, with socio- 
economic phenomena, the discussion over the ex- 
act form of these distributions - true power laws 
versus similar fat-tailed functions [TJ [21 [13] - 
often neglects the importance of dynamics and 
their underlying behaviour [T3] . 

We thus propose a model based upon individ- 
ual agents who are boundedly rational and are 
influenced by the behaviour of other agents in 
terms of their decision-making. In other words, 
the agents act with social purpose, which is fun- 
damentally different from physical or biological 
phenomena where the agents (or particles) are 
incapable of intent. The model provides four ad- 
vances on previous models: 

(a) It can generate a wide range of the right- 
skew distributions observed in cultural, economic 
and social situations from different combinations 
of its two parameters. 

(b) The widely used Albert-Barabasi (B-A) 
model of preferential attachment is simply a 
special case of this much more general model. 

(c) In terms of power law fits, there are two es- 
sential statistics, the exponent a and the fraction 
/ of the total observations over which the power 
law is believed to hold. The model can replicate 
both observed exponents a and the fraction / 
from real- world observations [TJ[2]. 

(d) Many real-world right-skew distributions 
exhibit constant turnover in the rankings of their 
constituents even if their functional form is time- 
invariant [m [TS]. Unlike the B-A model [7], 
our model is capable of generating such turnover 
without recourse to self-fulfilling rules such as 
'aging' or variable 'fitness' of the individual ele- 
ments [16]. 

1 The social influence model 

Consider a model populated initially by N agents 
located in some space such as the sequence of real 
numbers. Depending on the phenomenon, each 



location is an abstract representation; it could 
refer to the city where a firm chooses to locate 
itself, but it could equally well refer to the prod- 
uct a consumer chooses, or the idea or fashion 
that a person follows. 

The model proceeds in a series of steps. In 
each step, n new agents enter the model, where 
the number n is between 1 and N and fixed as a 
parameter in each solution of the model. With 
probability 1 — fj,, an agent copies the choice of 
location from that of an existing agent within the 
previous m time steps, or else with probability 
/i, the agent innovates by choosing a unique new 
location at random. In other words, the agent 
either copies an existing agent from the last m 
steps, or chooses a new location. 

Here we restrict our exploration to two key 
parameters of the model, m and fi, by choosing 
convenient values for N and n. The 'memory' 
parameter m determines the number of steps of 
the previous decisions of other agents over which 
an agent looks when making its decision. The 
'innovation' parameter /x determines the fraction 
of the agents who decide to take a completely 
new decision rather than replicating one of the 
decisions made by other agents. 

2 Variety of distributions 

A specific version of the model, with m = 1 
(i.e., memory only of the immediately preced- 
ing step), is known in population genetics and 
physics [TTJ [TH] . For the special case of n — N 
and m = 1, analytical solutions demonstrate a 
power-law distribution [TT for Nfi equal to or 
slightly greater than 1. For m — 1 and Nfj, <C 1, 
this gradually converges on a winner-take-all dis- 
tribution as Nfi approaches zero. 

The case where m = all is a further special 
category of the model, where extinction or ob- 
solescence does not occur. In this case, we can 
achieve different power law slopes by varying n 
and fi. Figure 1 shows, for example, that we can 
match the B-A preferential attachment model 
|7|, obtaining a power law exponent a ^ 3 over 
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Figure 1 : The power law generated from the pref- 
erential attachment version of the model. As the 
probability distribution for a typical model run 
using N = l,n ^ 10,t = 20, 000, = 0.6, and 
m = all (where the generated sizes are logarith- 
mically binned). The exponent for the power 
law is 2.9 (r^ — 0.996), matching that reported 
(also by least-squares regression) for preferential 
attachment models [?]• 

the entire distribution, by using m = all with 
iV = 1, n = 10, t = 20, 000, and ^ = 0.6. 

For socio-cultural phenomena, however, we ex- 
pect memory to be limited, and thus m in general 
to take values below the special case of 'all'. So 
while we define the model to allow m to take any 
value between 1 and all, we explore here a lim- 
ited range, from to = 1 to m = 100 time steps of 
limited memory. The combined effect of varying 
TO along with varying the innovation parameter 
^ generates both a wide range of right-skew dis- 
tributional forms and turnover of rankings of lo- 
cations within those distributions. Considerable 
anthropological and socio-economic evidence ex- 
ists |19l[20l|2ll[22l|23] on the plausible values for 
^ being no greater than 0.1. 

Figure 2a plots typical solutions of the model 
using acceptable values of /z, while varying to 
(holding N = 1000, n — 100 and showing the re- 
sults at time step 1000). Aside from the selected 
results shown in this figure, the model produces 



additional results ranging from a winner-take-all 
outcome, to a power law over the entire distri- 
bution (exponent a ~ 1.5), to a power law fitted 
to the tail of varying exponent. Figure 2b illus- 
trates how the model parameters can be selected 
so that the results match real-world right-skew 
distributions, such as religions, website subscrip- 
tions, word use, names, and author citations. 

3 Regularity in the long tail 

Table 1 lists power law tail exponents a for vari- 
ous recently collated social data sets [H [2] along 
with the fraction /(= ntaii/n) of total obser- 
vations in the tail. A striking, and previously 
unreported, feature of these estimates is the re- 
lationship between a and /, where these data 
reveal a clear inverse correlation. The smaller 
the fraction / of the distribution best-fit to a 
power law tail '24j, the larger the exponent a of 
that tail. The least-squares fit is a = 1.54/^"-^^^ 
(r^ = 0.952). 

Figure 3 plots this relationship in the empirical 
data along with the least squares fit using the 
model, as solved 100 times, for each of /i = 0.05, 

0. 06 and 0.07, with to 30 in each case (and 
N = 1000, n = 100, t = 1,000). The results 
show a = 1.56/^'''^^^ (r^ — 0.975), very similar 
to the data-based relationship. 

4 Distribution of turnover 

The model also produces continual turnover 
through time for any given distribution as 
demonstrated by the distributions of lifespans 
within ranked lists as in Figure 4a. This resem- 
bles the lifespans of real world social and eco- 
nomic fat-tail distributions in Figure 4b. The 
memory parameter to again expands the power 
of the model. Although turnover has already 
been demonstrated [17 for the special case to = 

1, different values of m are needed to account 
for empirically observed turnover (Supplemen- 
tary Information shows distributions generated 
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Table 1: Power-law fits determined by [2] among socio-cultural data sets 

Parameters include number of observations n, maximum observed value x„ 
observations in the tail ritau and the minimum value in the tail X; 



Quantity 


n 






a 




/ = ntail/' 


intensity of wars 


115 


382 


2.1 ±3.5 


1.7±0.2 


70± 14 


0.609 


religious followers (x 10^) 


103 


1050 


3.85 ± 1.60 


1.8 ±0.1 


39 ±26 


0.379 


word count 


18855 


14086 


7±2 


1.95 ±0.02 


2958 ± 987 


0.157 


city population (x 10^) 


19447 


8009 


52.5 ± 11.9 


2.37 ±0.08 


580 ± 177 


0.030 


terrorist attack severity 


9101 


2749 


12 ±4 


2.4 ±0.2 


547 ± 1663 


0.060 


surname frequency (x 10^) 


2753 


2502 


112±41 


2.5 ±0.2 


239 ±215 


0.087 


paper citations 


415229 


8904 


160 ±35 


3.16 ±0.06 


3455 ± 1859 


0.008 


email address books 


4581 


333 


57 ±21 


3.5 ±0.6 


196 ± 449 


0.043 


papers authored 


401455 


1416 


133 ± 13 


4.3 ±0.1 


988 ± 377 


0.002 



for increasing memory m with order of magni- 
tude changes in the value of fi). 

5 Discussion 

The model we have presented can generate not 
only a wide range of long-tailed distributions but 
a constant turnover of the constituent agents 
within any given overall rank-size distribution. 
It is also able to replicate a newly-identified em- 
pirical relationship whereby the power law expo- 
nent increases as the proportion of data in the 
tail falls. 

The model is quite general, despite using only 
two parameters. Varying the parameter values 
can yield a range of distributions, such as a power 
law over the whole sample, a power law only in 
the tail, and a winner-take-all outcome. Since 
the parameter m represents memory and fi rep- 
resents innovation in modelled decision-making, 
the real-world relationship between a and / in 
Figure 3 may result from a variation in related 
parameters among the different contexts of hu- 
man decision-making. We conjecture that the 
continuous relationship observed in Figure 3 sug- 
gests that socio-economic power law distribu- 
tions may form a continuum resulting from a 
generalised process with limited memory. In con- 



trast to the special m = all case (Figure 1), when 
model runs with limited memory yield a power 
law over the entire distribution (/ = 1), it is only 
with exponent a close to 1.5 (Figure 3). 

This combination of results makes this model 
unique among the many alternatives that can 
produce power laws. The most commonly pro- 
posed processes such as preferential attachment, 
proportionate effect based on Gibrat's princi- 
ple, the 'Matthew effect' and the Yule process 
[TJ [21 [in [25j [55] , produce power laws from the 
positive feedback introduced by interactions be- 
tween individual agents. But these 'rich get 
richer' models have not been able to account for 
flux in the constituents of the ranked distribution 
|27j , either when growth is one of strict preferen- 
tial attachment or even when growth is propor- 
tionate to a stochastic rate independent of size 
[2S]. Even though "dynamical problems lie at 
the forefront" of network science [16], in most 
network models, existing connections affect fu- 
ture connections such that change does not occur 
naturally, but only with imposed modifications. 

Social scientists have been critical of mod- 
elling social and economic data by mapping onto 
known phenomena in physics without consid- 
ering realistic behavioural motivations of the 
agents [T5J [^HJ [SO]- As a step in this direc- 
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Figure 2: Log-log plots of rank and size, (a) 
for typical model solutions with = 1000, n — 
100, t = 1000 and: ^ = 0.01,™ = 1 (black); 

= 0.01,™ 100 (red); /i = 0.08, m = 100 
(white); fi ^ 0.0001, m = 2 (green). (b) 
for real-world top 100 ranked lists (dots) ver- 
sus model results (lines). Top 100 lists include 
|38j : male baby name frequency (per million) in 
the 1990 US census (blue), RSS feed subscrip- 
tions 2001-2008 (orange), English words (red), 
cited economists 1993-2003 (purple), and reli- 
gions in thousands of adherents (green). With 
TV = 1000, the model fits were made with /i = 
0.001, m = 50,n = 200, i = 4000 for names, 
/Lt = 0.00002, m = 6, n = 2500, t = 10000 for RSS 
feeds, ^i = 0.00025, m = 85, n = 100, t = 1100 for 
cited economists, /i = 0.004, m = 4,n = 450, t = 
8000 for words, and fi = 0.0007, m = 2,n = 
100, t = 4000 for religions. 




0.001 



Figure 3: The power law tail exponent a versus 
the fraction / of total observations represented 
by the tail. The dots show power law tails calcu- 
lated for various real-world socio-cultural data 
sets (see Table 1 for values and errors), whose 
relationship (dashed grey curve) can be approx- 
imated by a = 1.54/-0156 (^2 = 0.952 except 
for the outlier - the open circle - from email 
lists) . The thin red curve shows the least squares 
fit from 300 runs of our theoretical model which 
gives a = 1.56/ f-°-^^^ (r^ = 0.975). Exponents 
have been estimated using maximum likelihood 
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Figure 4: Life-spans of individual locations, (a) 
Typical model runs, showing the cumulative dis- 
tribution of number of time steps spent in the 
top 5 for model runs of 1000 time steps with 
N = 1000,71 = 100, and m = 1. (b) Life-spans 
of UK Number One Hits [35] for 1956-2007 (open 
circles), versus the model, m — — 0.1 (blue 
line), and t years in the Top 5 US boys' names 
[40j . 1907-2006 (filled circles) versus the model, 
m = 10, /X = 0.001 (red line). Since the temporal 
units are arbitrary, the modelled lifespans were 
divided by 2 to match the albums, and divided 
by 10 to match the names (which on the log-log 
plot slides the distribution to the left). 



tion, our model captures two fundamental mo- 
tivations, the imitation of others and novelty in 
invention. 

Compared to similar, less flexible versions of 
this model |15j . a crucial new variable appears to 
be the memory m, which reflects different time 
frames to which agents will refer in different con- 
texts. In terms of pure fashion markets such as 
popular music for example ^31) , agents take into 
account only the most recent decisions of others 
and hardly ever those of several months or even 
weeks ago. However in choosing where to locate 
geographically, for example, a flrm or a person in 
a city will implicitly be using information from 
many previous time steps with respect to the de- 
cisions made by others. 

Generating a range of long-tailed distribu- 
tions with dynamic turnover, these features dis- 
tinguish this model from the standard socio- 
economic science model of individual rational be- 
haviour where social influence is the exception 
to the rule (as in, for example, 'irrational' stock 
market bubbles or real estate crises). With its 
unrealistic psychological assumptions |32j and 
inconsistencies with experimental results |33j . 
the standard model suffers from a neglect of so- 
cial influence, even in its modern form which 
permits, for example, asymmetry in the amount 
of information possessed by different agents [Ml 
35 , the cost of gathering information [36', and 
imperfections in gathering and processing infor- 
mation [3]. 

Social influence is arguably ubiquitous among 
the human species [37j. In fact, rather than the 
agent's cost-benefit analysis that has served as 
a null hypothesis for rationality for over a cen- 
tury, an alternative is that each agent uses (con- 
sciously or not) the decisions of others as a basis 
for his or her own decisions. 

The social-influence model we have presented 
allows choices among multiple possible alterna- 
tives, which rise and fall in relative popularity 
over time, rather than binary, 'either-or' deci- 
sions. This is truly reflective of human interac- 
tions such as the choice of a popular name for 
a child, the citation of an academic paper, or 
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movement to a city where others have chosen 
to live. Indeed, these phenomena are inherently 
defined by the past decisions of others, without 
which there would be no cities, famiUar names, 
or popular culture. 
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